Corpus-based Lexicography for Lesser-resourced Languages — Maximizing the Limited Corpus
نویسندگان
چکیده
منابع مشابه
The Crúbadán Project: Corpus building for under-resourced languages
We present an overview of the Crúbadán project, the aim of which is the creation of text corpora for a large number of under-resourced languages by crawling the web.
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملMining Word Senses from Text for Corpus-Based Lexicography
This paper discusses the problem of automated lexicography. In the corpus-based approach, a lexicographer has to manually group contexts of a target word into clusters in order to identify word senses. When a large number of the contexts is given, this process becomes a tedious and time-consuming task. To overcome this problem, we propose an efficient technique based on unsupervised clustering....
متن کاملOn using spoken data in corpus lexicography
Corpora are increasingly used in lexicography in order to provide good evidence for dictionary statements: the inclusion of spoken data in corpora is generally considered important. This paper raises some issues connected with the use of spoken data. It points out that the extensive differences between written and spoken language have great consequences for dictionary-making. It argues that the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lexikos
سال: 2015
ISSN: 2224-0039
DOI: 10.5788/25-1-1300